Round 1: Screening Test (1 Hour – PySpark Coding)
Environment: Coding done in a virtual lab provided by the company.
Task: Complex data transformation using PySpark.
Difficulty: High-level — required chaining multiple transformations to reach the expected output.
Skills Tested:
- DataFrame operations
- Joins, window functions
- Handling nested structures, nulls, and schema enforcement
Round 2: Technical + Project Discussion (Face-to-Face)
✅ SQL (5 Questions – Hard Level)
Advanced SQL involving:
- Multiple joins
- Window functions (LAG, LEAD, NTILE)
- CTEs and nested queries
- Aggregations with filtering
✅ Project Discussion
Deep dive into past projects:
- Architecture
- Tooling (e.g., Spark, Delta Lake, Azure/AWS)
- Your role in data ingestion, transformation, and performance tuning
✅ PySpark (4 Coding Questions)
Real-world data manipulation using:
- groupBy, agg, window
- Conditional logic with when, otherwise
- Handling nulls and schema mismatches
✅ Spark Optimization Techniques
- Tuning Spark configurations for performance
Round 3: HR
- Salary discussion, location, etc.